Regression Imputation for Skewed Multivariate Data using Copula Transformation

Zhixin Lun, Oakland University

Missing data is a common phenomenon in various data analyses. Imputation is viewed as a flexible method for handling missing-data problems since it efficiently uses all the available information in the rest of the data. Effectively, we predict the missing values from observed data and therefore, the performance of prediction plays a critical role in the imputation methodology. Most of the methodology available for missing data imputation revolves around data which are assumed to be distributed as multivariate normal and thus when the data are skewed, these methods may not be very effective. To deal with data which may have non-normal distribution, we introduce an approach based on Copula transformation which was recently introduced by Bahuguna and Khattree (2018, A Generic All Purpose Transformation for Multivariate Modeling through Copulas, Preprint). We demonstrate that under mild assumptions, the copula transformation can be successfully used to impute the skewed multivariate data. In this talk, we confine to regression methods for imputation under the assumption of missing at random (MCAR) and through extensive simulations with various probability densities and different correlation structures, study and compare the performance of our approach and the one when multivariate normality is (incorrectly) assumed. Based on the simulations, we demonstrate that this new approach performs considerably better for the imputation of missing values in terms of smaller average sum squares of residuals. Further, percent of times our approach gives smaller sum of squares of residual is almost always and considerably more than 50 percent.